The MiningMart Approach
نویسندگان
چکیده
ion: Meta-data are given at different levels of abstraction, a conceptual (abstract) and a relational (executable) level. The same case at the conceptual level can be mapped on several different cases at the relational level. This makes an abstract case re-usable. Ease of case adaptation: In order to run a given sequence of operators on a new database, only the relational meta-data and their mapping on the conceptual meta-data has to be written. The MiningMart project has developed a model for meta-data together with its compiler and implements human-computer interfaces that allow database managers and case designers to fill in their application-specific meta-data. The system will support preprocessing and can be used stand-alone or in combination with a toolbox for the data mining step. Figure 1: Overview of the MiningMart system 2 System Architecture This section gives an overview of how a case is represented at the meta-level, how it is practically applied to a database, and which steps need to be performed. 2.1 The Meta-Model of Meta-Data M4 The form in which meta-data are to be written is specified in the meta-model of meta-data, M . It is structured along two dimensions, topic and abstraction. The topic is either the business data or the case. The business data are the ones to be analysed. The case is a sequence of (preprocessing) steps. The abstraction is either conceptual or relational. Where the conceptual level is expected to be the same for various applications, the relational level actually refers to the particular database at hand. The meta-data written in the form as specified by M are stored in a relational database themselves. 2.2 Editing the Conceptual Data Model As depicted in figure 1, there are different kinds of experts working at different ends of a knowledge discovery process. First of all a domain expert will define a conceptual data model, using a concept editor. The entities involved in data mining are made explicit by this expert. The conceptual model of M is about concepts having features and relationships between these concepts. Concepts and Relationships may be organized hierarchically by means of inheritance. Examples for concepts are “Customer” and “Product”, a relationship between these two could be “Buys”. 2.3 Editing the Relational Model Given a conceptual data model, a database administrator maps the involved entities to the corresponding database objects. The relational data model of M is capable of representing all the relevant properties of a relational database. The most simple mapping from the conceptual to the relational level is given, if concepts directly correspond to database tables or views. This can always be achieved manually by inspecting the database and creating a view for each concept. However, more sophisticated ways of graphically selecting attributes (or “features”) and aggregating them to concepts, increase the acceptance by end users. In the project, the relational editor is intended to support this kind of activity. In general it should be possible to map all reasonable representations of entities to reasonable conceptual definitions. A simple mapping of the concept “Customer”, containing the features “Customer ID”, “Name”, “Address” to the database would be to state that the table “CUSTOMER” holds all the necessary attributes, e.g. “CUSTOM ID”, “CUST NAME” and “CUST ADDR”. An example for more complex mappings occurs if the information about name and address needs to be joined first, e.g. using the shared key attribute “CUSTOM ID”. 2.4 The Case and Its Compiler All the information about the conceptual descriptions and about the according database objects involved are represented within the M model and stored within relational tables. “M -Cases” denote a collection of steps, basically performed sequentially, each of which changes or augments one or more concepts. Each step is related to exactly one operator, and holds all of its input arguments. The M compiler reads the specifications of steps and starts the according executable operator, passing all the necessary inputs to it. Depending on the operator, the database might be altered. In any case the M meta-data will have to be updated by the compiler. A machine learning tool to replace missing values is an example for operators altering the database. In contrast, for operators like a join it is sufficient to virtually add the resulting view – together with corresponding SQL-statement – to the meta-data. The task of a case designer, ideally a data mining expert, is to find sequences of steps resulting in a representation well suited for data mining. This work is supported by a special tool, the case editor. Usually a lot of different operators will be involved in a case of preprocessing steps. A list of available operators and their overall categories, e.g. Feature Construction, Clustering or Sampling is part of the conceptual case model M . In every step the case designer chooses an applicable operator, sets all the parameters of the operator, assigns the input concepts, input attributes and/or input relations and gives some specifics about the output. Applicability conditions are considered in two ways. On one hand, constraints of operators can be checked on the basis of meta-data. These are, for instance, the presence or absence of NULL values. On the other hand, the conditions of operators can be checked on the basis of the business data when running the case. Applicability constraints and conditions support the case designer by checking the validity of a sequence of steps while it is created. The sequence of many steps, namely a case, transforms the original database into another representation. Each step and their ordering is formalized within M , so the system is automatically keeping track of the performed activities. This enables the user to interactively edit and replay the case or parts of it. Further more, as soon as an efficient chain of preprocessing has been found, it can easily be exported by just submitting the conceptual meta-data.
منابع مشابه
3. The MiningMart Approach to Knowledge Discovery in Databases
Although preprocessing is one of the key issues in data analysis, it is still common practice to address this task by manually entering SQL statements and using a variety of stand-alone tools. The results are not properly documented and hardly re-usable. The MiningMart system presented in this chapter focuses on setting up and re-using best practice cases of preprocessing data stored in very la...
متن کاملThe MiningMart Approach to Knowledge Discovery in Databases
Although preprocessing is one of the key issues in data analysis, it is still common practice to address this task by manually entering SQL statements and using a variety of stand-alone tools. The results are not properly documented and hardly re-usable. The MiningMart system presented in this chapter focusses on setting up and re-using best-practice cases of preprocessing data stored in very l...
متن کاملMiningMart: Sharing Successful KDD Processes
ion: Metadata are given at different levels of abstraction, a conceptual and a relational level. This makes an abstract case understandable and re-usable. Data and Case documentation: The database objects (tables or views) as well as their conceptual counterparts are declaratively stored. So is the chain of preprocessing operations, including all operators’ parameter settings etc. All entities ...
متن کاملChurn Prediction in Telecommunications Using MiningMart
This paper summarises a successful application of Knowledge Discovery in Databases (KDD) in an Italian telecommunications research lab. The aim of the application was to predict customer churn behaviour. A critical success factor for this application was clever preprocessing of the given data, in particular the construction of derived predictor features. The application was realised in the Mini...
متن کاملFeatures for Learning Local Patterns in Time-Stamped Data
Time-stamped data occur frequently in real-world databases. The goal of analysing time-stamped data is very often to find a small group of objects (customers, machine parts,...) which is important for the business at hand. In contrast, the majority of objects obey well-known rules and is not of interest for the analysis. In terms of a classification task, the small group means that there are ve...
متن کامل